https://Carolinah23.github.io

MILESTONE 2 - Analysing data from Water Wells on Baton Rouge, LA

Diana Carolina Hurtado Pulido
November/18/2021

CMPS 3160/6160

Description project and Goals

Currently, I am working on finding subsidence rates in Baton Rouge, Louisiana during the last two decades, and investigating what factors are causing subsidence using LiDAR (Light Detection And Ranging) data from 1999 and 2018.

Subsidence is the vertical land movement caused by natural and anthropogenic factors such as sediment compaction, isostatic adjustments, fault slip, and extraction or injection of fluids. The Gulf of Mexico coastline is under constant monitoring due to high rates of Sea Level rise and subsidence, which cause rapid land loss. The study area is not in the coastal area though it is subsiding (Figure 1). This area is of particular interest because there has not been enough research to determine what factors are causing subsidence. This area has two geological faults, grow of urban development, almost 2000 water wells (with different uses) active during the study period, approximately 40 active oil and gas wells, and 11 injection wells.

So far, my results show subsidence in the whole region (Figure 1). Surprisingly subsidence increases from south to north, which is the opposite of the expected results. Areas closer to the coast and water bodies have large subsidence rates due to younger sediments compaction, but most important in this area, the faults move towards the south. Interestingly, small areas show localized subsidence and uplifting, this behavior likely is related to human activities. Then, having these results, the main goal of this project is to find out how these subsidence values are related to groundwater extraction. Important questions:

  1. The location of clusters of wells effectively occurs in areas subsiding or uplifting locally?
  2. Are deeper wells causing more or subsidence than wells at shallow depths?
  3. Is there any particular well use that may be causing more vertical changes?
  4. Using the wells with yield (extraction rate), are high yield values related to subsidence?

Figure 1: Relative subsidence on Baton Rouge using LiDAR differencing between 1999 and 2018. Each map shows different methods applied to find elevation changes (Result of my research)

Extraction, Transform and Load (ETL)

Data description

Dataset 1 and 2 come from the Department of Natural Resources of Louisiana. The dataset 1 - Wells_df - correspond to the water wells functioning during the period of study (1999 and 2018) in the area. We collected this data (with a collaborator), last year (2020), and calculated the well depth on meters. The data is not complete, the missing values were not published when the well-owners uploaded the information, or the information is too old in some cases. Dataset 2 locations_df are the coordinates on NAD83(2011) / UTM zone 15N and other data from the water wells, just some general information will be conserved for this analysis.
Dataset 3 - varZ_df has the information of elevetation changes calculated in my research between 1999 and 2018 in a grid of 100 meters in the same coordinate system than locations_df .

For my analysis I will use the following variables in Wells_df: Well Depth (on meters), Well Use, Yield (rate of extraction - gallons per minute), and the dates of construction, last date the well was active..
For the dataset locations_df: Coordinates X and Y, LocalWellNumber (to merge the data with the first dataset), Water table depth (how depth can we find underground water - in meters), and Aquifer Name.
For dataset varZ_df : Coordinates X and Y, and average elevation change in each point.

Loading data:

I and uploading the data using the read_csv()

Tidy data

  1. Droping
  2. Raplacing incorrect or empty values by NaN
  3. Convert data to appropiate types
  4. Renaming and organizing columns

Wells_df

Wells_df has two variables that I will not use: WellDepth and SerialNumber. The first variable will not be used because I will use the information on meters, and the second one is not the main identificator of the observations and is not complete.

There are non numeric values in the Yield variable, and also "NN" values in the LastActive_Plugged_date and DateConstructed that must be changed to NaN using np.nan

The cell below shows the variable types, to datetime the variables Last_Active_Plugged_date and DateConstructed and to number I changed the variables Yeild and Well_Depth_Meters

And finally, I am replacing the Y and N (Yes and No) by 1 and 0 in the Active column. This variable indicates if the well was active extracting groundwater to the date that we took the data. For instance, if the well was active between 2001 and 2014 it should has a N, also, if the well is been used to monitior groundwater it should be inactive.

locations_df

locations_dfFirst, I will calculate the water table depth o meters because it is on feet. Having this variable I will drop a set of variables that are not necessary in this analysis to conserve only the variables mentioned above.

Now, I change the name of the identificator with rename to have the same column in both thables, also I reoganice the columns for better understanding.

varZ_df

Exploratory Data Analysis

For this seconf milestone, I am doing some analysis to see how my data distributes using different variables.

First I will merge the tables with the information about the wells with its respective localization. These dataframes have different size because locations_df has the information of all the wells that have been in the area since the data started to be collected, while Wells_df only has data for wells that were active in any period between 1999 and 2018.

First, I want to know how is the distribution of wells depth and water table depth. My first guess is that they should follow a similar distribution, because if the wells have the purpose of extracting water, then they should be at least as depth as the water table or maybe a little more.
The data of both variables are skewed to the right, then I applied a transformation to the variables in the graph using the Ladder of Powers with an exponent of 0.3 in both cases, then if a well has a depth of 10 meters it will appear in the graph as 1.995...

The graph above shows that both variables do not have distributions as similar as I thought. Water table depth accumulates between 0 and 4 (0-102 meters depth), but the majority of wells just reach 40 meters, meaning that many do not reach the water table. Maybe the wells that do not reach the water table are used for other purposes, or there is always the possibility that one (or both) datasets were not filled initially correctly. We found that data for wells is not well stored. Definitely, there are more water wells at shallower depths where the sediment may be less compacted. The study area is on stable sediments, but changes in volume are still possible.


Now, I want to know how the wells' installation and deactivation are distributed over time, are old wells still active? when were the wells installed and how much have they been there?
The following graph shows that the construction of wells increased greatly in the '80s and decreased recently, however many of these wells are still active. Comparing the wells installed before 1980 and the wells the last active between 1980 and 1990, I can see that the rate of installation and inactivation is not even close.


The following graph shows how depth are the wells that exist (or have been active during the study period) for each use.
Public supply has the deepest wells, these wells provide water to water and sewage users in East Baton Rouge. The next ones are Observation wells and Industrial Wells. Then most of the wells in these cases are on areas where the water table is deep even if they are uncommon (first graph).

Now, I want to see how subsidence change from south to north and from west to east. This graph will help us to define what are the spatial trends that may not be easily observable in the map shown above.

The graph shows that there is more variation in elevation changes from south to north than from west to east. Just seeing the top graph, the northern area reaches variations of -0.25 m (25 centimeters), which is unusual having that the southern area should be having more negative elevations. With respect to the horizontal coordinate, elevation variation seems to be more stable, except for the strong bumps at the coordinate 687500, meaning that we have more variation of subsidence to the south than horizontally. This conclusion has also been studied with GPS permanent stations in the area.

In the following graph, I want to see how is the distribution of the wells in the area using the coordinates X and Y, and how depth they are. In the top graph, we see that deeper wells are located to the north, where there is more subsidence (strong negative elevation changes - graph above). Most of the wells have a depth of less than 200 meters, then, is this a hint pointing that deeper wells relate to higher values of subsidence? There are more wells in the northern area than the south, which may also indicate this.
On the other hand, the location of the wells in the X coordinate does not seem to indicate any relation with elevation changes previously seen.

Model Questions

The analysis made before brings up two important relations that will be settled using statistical modeling, 1) Relationship between calculated elevation changes and location of water wells from south-north, apparently, wells at deeper depths could cause more subsidence and 2) Public supply, Industry, and observational wells are the deepest wells, may one of these uses causing more elevation changes.

To answer these questions I will do a stadistical analysis to find the correlation between depth and elevation change in the study period. I compare these results with the distribution of uses to try to point out what use is causes more changes.

Also, I will the Yield variable to find out if the extraction rate is linked in some way to subsidence or if the depth is more important. This correlation analysis is limited because only 185 wells of 1972 in the area have this information.